Fix scanning issues related to Unicode escapes in identifiers#61042
Fix scanning issues related to Unicode escapes in identifiers#61042graphemecluster wants to merge 4 commits intomicrosoft:mainfrom
Conversation
- Added the `scanIdentifierStart` method - Refactored `scanIdentifierParts` & `scanIdentifier` Implementing the above automatically fixes another issue that Unicode escapes & extended Unicode escapes are not recognised at the beginning of a RegExp group name.
graphemecluster
left a comment
There was a problem hiding this comment.
You might want to leave out the last commit when reviewing for a better diff.
| // "-" and ":" are valid in JSX Identifiers | ||
| (identifierVariant === LanguageVariant.JSX ? (ch === CharacterCodes.minus || ch === CharacterCodes.colon) : false) || | ||
| // "-" is valid in JSX Identifiers. ":" is part of JSXNamespacedName but not JSXIdentifier. | ||
| identifierVariant === LanguageVariant.JSX && ch === CharacterCodes.minus || |
There was a problem hiding this comment.
Changing this line doesn't affect everything in the test suite, but I am not sure if it affects code completion.
src/compiler/scanner.ts
Outdated
| ch = peekExtendedUnicodeEscape(); | ||
| if (ch >= 0 && isIdentifierPart(ch, languageVersion)) { | ||
| if (ch >= 0 && isIdentifierPart(ch, languageVersion, identifierVariant)) { | ||
| result += text.substring(start, pos); |
There was a problem hiding this comment.
This is the missing line that causes #61043.
| pos += charSize(ch); | ||
| return token; // Still `SyntaxKind.Unknown` | ||
| tokenFlags = TokenFlags.None; | ||
| return scanIdentifier(ScriptTarget.ESNext); |
There was a problem hiding this comment.
I tested and it does not affect anything in the test suite even if pos is not advanced.
| return token = SyntaxKind.Identifier; | ||
| } | ||
|
|
||
| function scanIdentifier(languageVersion: ScriptTarget, identifierVariant?: LanguageVariant | "RegExpGroupName") { |
There was a problem hiding this comment.
Alternatively, either an internal value can be added into LanguageVariant, or an enum called IdentifierVariant can be created, but for the latter case, I am not sure if isIdentifierPart and isIdentifierText should also be amended since it might be breaky, at least at type level.
| /(?<\u{D800}\u{DC00}>)\k<\u{D800}\u{DC00}>/; | ||
|
|
||
| !!! error TS1514: Expected a capturing group name. | ||
| ~~~~~~~~ | ||
| !!! error TS1538: Unicode escape sequences are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set. |
There was a problem hiding this comment.
The first escape sequence error is not reported as it starts at the same position as the TS1514 error. Is it really desirable? Personally, I would prefer to have an additional check that the last error is not of length zero, in which case it is not considered to be overlapping.
|
@RyanCavanaugh I understand that your Team is busy working on Corsa, but since you just pinged me regarding another regex issue, would it be possible for you to take some time to have a look at this pull request too? Edit: Due to Ron’s unfortunate situation, please allow me to bring this to the attention of an extra person. @jakebailey? |
|
I intend to review this, yes, though I will note that whatever we add here is going to have to get ported to the new Go compiler so hopefully it's not too complicated or tied to JS strings |
This PR fixes several issues related to identifiers scanning:
For regular expression group name
scanIdentifierhandles that for me. In this PR,scanIdentifieritself is refactored so this is automatically fixed.For all identifiers